-
Notifications
You must be signed in to change notification settings - Fork 16
fix: add clean-data playbook to prevent storage-related deployment failures #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: add clean-data playbook to prevent storage-related deployment failures #81
Conversation
ch4r10t33r
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We appear to be double cleaning.
When clean_data=true is passed via site.yml, data gets cleaned:
- First in clean-data.yml (step 1)
- Then again in each role's tasks
Should we consider using clean-data.yml for pre-deployment cleaning alone and remove role level cleaning when using site.yml?
| msg: "Node key file {{ node_name }}.key not found in {{ genesis_dir }}" | ||
| when: not (node_key_stat.stat.exists | default(false)) | ||
|
|
||
| - name: Check if node data directory has contents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract this into a shared task file under roles/common/tasks/clean-node-data.yml and use include_tasks in each role.
- name: Clean node data if requested
include_tasks: "{{ playbook_dir }}/../roles/common/tasks/clean-node-data.yml"
when: clean_data | default(false) | bool
This will modularize the code (instead of copying the same content across 5 different roles (which is likely to increase in future).
| path: "{{ data_dir }}/{{ node_name }}" | ||
| state: absent | ||
| when: clean_data | default(false) | bool | ||
| when: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The find task to check if a directory has contents before deletion is redundant. Ansible's file: state=absent is idempotent. It will succeed whether the directory exists, is empty, or has contents.
|
|
||
| - name: Extract all node names | ||
| shell: | | ||
| yq eval '.validators[].name' {{ validator_config_file }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nit, but can we please add a check to validate if yq is installed before using it here?
This one was tricky. Both When a server runs out of storage, cleaning must be the first remote task for all deployment playbooks. Otherwise, Ansible fails to create a temp folder on the server: Currently, |
|
Should we move helper files like |
While running devnet-1, one of the client nodes reached its storage limit. A new ansible deployment failed when copying genesis files to the server due to insufficient storage. To restart the devnet, manual cleaning was required. This PR adds centralized data cleaning that runs before genesis generation when using the
--cleanDataflag.The clean-data playbook can be called independently for cleaning up node data directories. It supports all deployment modes: site.yml, deploy-nodes.yml, and tag-based execution.
Scenarios tested on Lighthouse server